home *** CD-ROM | disk | FTP | other *** search
- A68k - a freely distributable assembler for the Amiga
-
- by Charlie Gibbs
-
- with special thanks to
- Brian R. Anderson and Jeff Lydiatt
-
- (Version 1.02 - September 9, 1987)
-
- Note: This program is NOT Public Domain. Permission is given to freely
- distribute this program provided no fee is charged, and this
- documentation file is included with the program.
-
- This assembler is based on Brian R. Anderson's 68000 cross-assembler
- published in Dr. Dobb's Journal, April through June 1986. I have converted
- it to produce AmigaDOS-format object modules, and have made many enhancements,
- such as macros and include files.
-
- My first step was to convert the original Modula-2 code into C.
- I did this for two reasons. First, I had access to a C compiler, but
- not a Modula-2 compiler. Second, I like C better anyway.
-
- The executable code generator code (GetObjectCode and MergeModes) is
- essentially the same as in the original article, aside from its translation
- into C. I have almost completely rewritten the remainder of the code,
- however, in order to remove restrictions, add enhancements, and adapt it to
- the AmigaDOS environment. Since the only reference book available to me
- was the AmigaDOS Developer's Manual (Bantam, February 1986), the assembler
- and the remainder of this document work in terms of that book.
-
-
- RESTRICTIONS
-
- Let's get these out of the way first. There are a few things that I
- have not yet implemented, and some outright bugs that would take too long
- to correct for this version.
-
- o The verification file (-v) option is not supported. Diagnostic
- messages always appear on the console. They also appear in the
- listing file, however (see extensions below).
-
- o The file names in the include directory list (-i) must be separated
- by commas. The list may not be enclosed in quotes.
-
- o Labels assigned by EQUR and REG directives are case-sensitive.
-
- o The following directives are not supported, and will be flagged as
- invalid op-codes:
-
- RORG
- OFFSET
- NOPAGE
- LLEN
- PLEN
- NOOBJ
- FAIL
- FORMAT
- NOFORMAT
- MASK2
-
- I feel that NOPAGE, LLEN, and PLEN should not be defined within a
- source module. It doesn't make sense to me to have to change your
- program just because you want to print your listings on different
- paper. The command-line option "-p" (see below) can be used as a
- replacement for PLEN.
-
-
- EXTENSIONS
-
- Now for the good stuff:
-
- o Labels can be any length that will fit onto one source line
- (currently 127 bytes maximum). Since labels are stored on the
- heap, the number of labels that can be processed is limited only
- by available memory, which can be increased by using the "-w"
- option (see below).
-
- o Since section data and user macro definitions are stored on the
- same heap as the symbol table (see above), they too are limited
- only by available memory. (Actually, there is a hard-coded limit
- of 32767 sections, but I doubt anyone will run into that one.)
-
- o The only values a label cannot take are the register names - the
- assembler can distinguish between the same name used as a label,
- instruction name, macro name, directive, or section name.
-
- o Section and user macro names appear in the symbol table dump, and
- will also be cross-referenced. Their names can be the same as any
- label (see above); the assembler can sort them out.
-
- o Includes and macro calls can be nested indefinitely, limited only
- by available memory. The message "Secondary heap overflow -
- assembly terminated" will be displayed if memory is exhausted.
- You can increase the size of this heap using the -w parameter
- (see below). Recursive macros are supported; recursive includes
- will, of course, result in a loop that will be broken only when
- the heap overflows.
-
- o The EVEN directive forces alignment on a word (2-byte) boundary.
- It does the same thing as CNOP 0,2.
- (This one is left over from the original code.)
-
- o Branch (Bcc) instructions to a previously-defined label will be
- automatically converted to short form if possible. This feature is
- not available for forward branches, since in pass 1 the assembler
- doesn't yet know how far the branch must go.
-
- o If a MOVEM instruction only specifies one register, it is converted
- to the corresponding MOVE instruction. Instructions of the form
- MOVEM D0-D0,label will not be converted, however.
-
- o ADD, SUB, and MOVE instructions will be converted to ADDQ, SUBQ,
- and MOVEQ respectively if possible. Instructions coded explicitly
- as (for example) ADDA or ADDI will not be converted.
-
- o ADD, CMP, SUB, and MOVE to an address register are converted to
- ADDA, CMPA, SUBA, and MOVEA respectively, except if an ADD, SUB,
- or MOVE instruction has already been converted to quick form.
-
- o ADD, AND, CMP, EOR, OR, and SUB of an immediate value are converted
- to ADDI, ANDI, CMPI, EORI, ORI, and SUBI respectively (unless the
- address register or quick conversion above has already been done).
-
- o If both operands of a CMP instruction are postincrement mode, the
- instruction is converted to CMPM.
-
- o The SECTION directive allows a third parameter. This can be
- specified as either CHIP or FAST (upper- or lower-case). If this
- parameter is present, the hunk will be written with the MEMF_CHIP
- or MEMF_FAST bit set. This allows you to produce "pre-ATOMized"
- object modules.
-
- o The synonyms DATA and BSS are accepted for SECTION directives
- starting data or BSS hunks. A section name is mandatory for
- all non-CODE hunks.
-
- o The ability to produce Motorola S-records is retained from the
- original code. The -s option causes the assembler to produce
- S-format instead of AmigaDOS format. Relocatable code cannot be
- produced in this format.
-
- o Error messages include the name of the source, macro, or include
- module that contains the statement in error, plus the line number
- within the module of the offending line. If a statement has
- multiple errors, this information appears only on the first
- error message for the statement.
-
-
- HOW TO USE IT
-
- The command-line syntax to run the assembler is as follows:
-
- a68k <source file>
- [-e<equate file>]
- [-h<header file>]
- [-i<include dirlist>]
- [-l<listing file>]
- [-o<object file>]
- [-p<page depth>]
- [-d]
- [-s]
- [-w[<primary-heap-size>][,secondary-heap-size]]
- [-x]
-
- These options can be given in any order, so if you like to specify your
- switches first, you can. Option values, if any, must immediately follow
- the keyword with no intervening spaces.
-
- If the -o keyword is omitted, the object file will be given a default
- name. It is created by replacing all characters after the last period in
- the source file name by "o". For example, if the source file name is
- "myprog.asm", the object file name defaults to "myprog.o". A source name
- of "my.new.prog.asm" produces a default object file name of "my.new.prog.o".
- If the source file name does not contain a period, ".o" is appended to it
- to produce the default object file name.
-
- The default value for the listing file name is arrived at in the same
- way as the object file name, except that ".lst" is appended instead of ".o".
- If you don't specify this parameter, no listing file will be produced.
- If you specify -x (see below), -l (with the default name) is assumed,
- although you can still use this parameter if you wish.
-
- The default value for the equate file name is arrived at in the same
- way as the object file name, except that ".equ" is appended instead of ".o".
-
- The include directory list is a list of directory names separated by
- commas. No embedded blanks are allowed. For example, the specification
- -imylib,df1:another.lib
- will cause include files to be searched for first in the current directory,
- then in "mylib", then in "df1:another.lib".
-
- The -d keyword causes symbol table entries (hunk_symbol) to be written
- to the object module for the use of symbolic debuggers.
-
- The -p keyword causes the page depth to be set to the specified value.
- If omitted, a default of 60 lines (-p60) is assumed.
-
- The -s keyword, if specified, causes the object file to be written in
- Motorola S-record format. If omitted, AmigaDOS format will be produced.
- The default name for an S-record file has ".s" appended to the source name,
- rather than ".o"; this can still be overridden with the -o keyword, though.
-
- The -w keyword specifies the size of the heaps used. The primary heap
- stores the symbol table, user macro text, relocation information, and
- cross-reference information. The secondary heap stores information for
- nested macro calls and include files. The primary heap size defaults to
- 32768 bytes, which should be enough for all but the largest assemblies.
- The secondary heap size defaults to 1024 bytes, which should be enough
- unless you use very deeply nested macros and/or include files with long
- path names. You can specify either or both parameters. For example:
- -w40000 secondary heap size remains at 1024 bytes
- -w,2000 primary heap size remains at 32768 bytes
- -w40000,2000 increases the size of both heaps
- If you're really tight for memory, and are assembling small modules, you
- can use this keyword to shrink the heaps below their default sizes.
- At the end of an assembly, a message will be displayed giving the
- amount of heap space actually used, in the form of the -w command
- you would have to enter to allocate the mininum heap space.
- See below for a layout of the heaps.
-
- The -x keyword will produce a symbol table dump, including
- cross-reference information. If you haven't also specified -l (with
- or without a file name), -l with the default file name will be assumed.
-
-
- If you wish to override the default object and (optionally) listing
- file names, you can omit the -o and -l keywords. The assembler interprets
- the first three parameters without leading hyphens as the source, object,
- and listing file names respectively. Anything over three file names is an
- error, as is attempting to respecify a file name with the -o or -l keywords.
-
-
- The primary heap is built from both ends. Symbol table entries
- (including labels) and macro text are stored during pass 1. Cross-reference
- data is stored during pass 2. Relocation information is also stored during
- pass 2, but is cleared at the end of each SECTION. Since it is no longer
- needed once dumped, the space is freed for re-use by the next section's
- relocation information. The expression parser also uses the primary heap
- to store its working stacks - this space is freed as soon as an expression
- has been evaluated.
- The fixed portion of each symbol table entry occupies 16 bytes. The
- labels and macro text occupy just enough space to hold their strings
- (including the end-of-string delimiter) - they are all pointed to by fixed
- symbol table entries. Relocation entries occupy 10 bytes each.
- Cross-reference entries are 12 bytes long - each holds four references to
- one symbol. The expression parser creates temporary entries for terms
- (10 bytes each) and operators (4 bytes each). Since terms are combined
- as soon as possible, the parser almost never needs to store the entire
- expression on the heap.
- The diagram below illustrates the layout of the primary heap. High
- memory addresses are at the top of the diagram, while low addresses are
- at the bottom. The names on the left of the diagram are the names of the
- pointers to the various tables within the heap.
-
- Heap + maxheap -------------> ___________________________
- | |
- | Symbol table |
- struct SymTab *SymStart ---> |___________________________|
- | |
- | Symbol references |
- struct Ref *RefStart -------> |___________________________|
- | |
- | (unused space) |
- char *HeapLim --------------> |___________________________|
- | |
- | Relocation data |
- struct RelTab *RelStart ----> |___________________________|
- | |
- | Labels and macro text |
- char *Heap -----------------> |___________________________|
-
- Note that the pointers are to various types. This makes for
- lots of interesting casts. (Ain't C fun?) Since the relocation
- data is cleared at the end of each section, HeapLim will move up and
- down. The "high-water mark" is stored in char *HighHeap, which is
- used solely to produce the memory usage message at the end of the
- assembly. Note that a program may consist of a section containing
- many relocatable references, followed by a section with fewer
- relocatable references but lots of symbol references. In this case,
- RefStart might end up below HighHeap, and the final message would
- indicate that more heap space was used than was available. This is
- not an error - only if RefStart hits HeapLim will an error be reported.
-
-
- The secondary heap is also built from both ends, but it grows and
- shrinks according to how many macros and include files are currently open.
- At all times there will be at least one entry on the heap, for the original
- source code file.
- The bottom of the heap holds the names of the source code file and
- any macro or include files that are currently open. The full path is
- given. A null string is stored for user macros. Macro arguments are
- stored by additional strings, one for each argument in the macro call line.
- All strings are stored in minimum space, similar to the labels and user
- macro text on the primary heap. File names are pointed to by the fixed
- table entries (see below) - macro arguments are accessed by stepping past
- the macro name to the desired argument, unless NARG would be exceeded.
- The fixed portion of the heap is built down from the top. Each entry
- occupies 16 bytes. Enough information is stored to return to the proper
- position in the outer file once the current macro or include file has been
- completely processed.
- The diagram below illustrates the layout of the secondary heap.
-
- Heap2 + maxheap2 -----------> ___________________________
- | |
- | Input file table |
- struct InFCtl *InF ---------> |___________________________|
- | |
- | Parser operator stack |
- struct OpStack *Ops --------> |___________________________|
- | |
- | (unused space) |
- struct TermStack *Term -----> |___________________________|
- | |
- | Parser term stack |
- char *NextFNS --------------> |___________________________|
- | |
- | Input file name stack |
- char *Heap2 ----------------> |___________________________|
-
- The "high-water mark" for NextFNS is stored in char *High2,
- and the "low-water mark" (to stretch a metaphor) for InF is stored
- in struct InFCtl *LowInF. Again, these figures are used only to
- determine the maximum heap usage.
-
-
- Please send me any bug reports, flames, etc. I can be reached on
- Dorean BBS (604/432-8579), Mind Link (604/533-2312), at any Panorama
- (PAcific NORthwest AMiga Association) meeting, or via Jeff Lydiatt
- or Larry Phillips. (I don't have the time or money to live on
- Usenet or CompuServe, etc.)
-
- Charlie Gibbs
- #21 - 21555 Dewdney Trunk Road
- Maple Ridge, B.C. CANADA
- V2X 3G6
-
-
- P.S. I plan to add 68010/68020 support in the future. Stay tuned.
-